Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells437426
Missing cells (%)8.2%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh correlation
Age has 90 (20.2%) missing values Age has 89 (20.0%) missing values Missing
Cabin has 346 (77.6%) missing values Cabin has 336 (75.3%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 300 (67.3%) zeros SibSp has 303 (67.9%) zeros Zeros
Parch has 328 (73.5%) zeros Parch has 337 (75.6%) zeros Zeros
Fare has 5 (1.1%) zeros Fare has 7 (1.6%) zeros Zeros
Alert not present in this datasetFare is highly overall correlated with PclassHigh correlation
Alert not present in this datasetPclass is highly overall correlated with FareHigh correlation

Reproduction

 Dataset ADataset B
Analysis started2025-03-18 22:15:08.3544092025-03-18 22:15:10.549153
Analysis finished2025-03-18 22:15:10.5462992025-03-18 22:15:12.733948
Duration2.19 seconds2.18 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean447.7287466.7287
 Dataset ADataset B
Minimum81
Maximum889889
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:12.836751image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum81
5-th percentile56.2554.5
Q1235.5249.25
median452.5475
Q3655.25685.75
95-th percentile844.5842.75
Maximum889889
Range881888
Interquartile range (IQR)419.75436.5

Descriptive statistics

 Dataset ADataset B
Standard deviation253.16026252.4113
Coefficient of variation (CV)0.565432280.54080947
Kurtosis-1.1520817-1.1763698
Mean447.7287466.7287
Median Absolute Deviation (MAD)211218
Skewness-0.028071942-0.092120283
Sum199687208161
Variance64090.11763711.466
MonotonicityNot monotonicNot monotonic
2025-03-18T22:15:12.977883image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
596 1
 
0.2%
555 1
 
0.2%
879 1
 
0.2%
185 1
 
0.2%
382 1
 
0.2%
504 1
 
0.2%
339 1
 
0.2%
616 1
 
0.2%
70 1
 
0.2%
64 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
249 1
 
0.2%
421 1
 
0.2%
721 1
 
0.2%
400 1
 
0.2%
302 1
 
0.2%
750 1
 
0.2%
65 1
 
0.2%
710 1
 
0.2%
92 1
 
0.2%
648 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
8 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%
23 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
7 1
0.2%
10 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
7 1
0.2%
10 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
8 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
15 1
0.2%
16 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%
23 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
271 
1
175 
0
263 
1
183 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row01
3rd row11
4th row11
5th row00

Common Values

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 263
59.0%
1 183
41.0%

Length

2025-03-18T22:15:13.077803image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T22:15:13.124168image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:13.158758image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 263
59.0%
1 183
41.0%

Most occurring characters

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 263
59.0%
1 183
41.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 263
59.0%
1 183
41.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 263
59.0%
1 183
41.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 263
59.0%
1 183
41.0%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
252 
1
105 
2
89 
3
236 
1
110 
2
100 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row32
3rd row32
4th row33
5th row33

Common Values

ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Length

2025-03-18T22:15:13.212975image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T22:15:13.261366image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:13.304633image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring characters

ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:13.609694image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5246
Mean length26.8228727.289238
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1196312171
Distinct characters6060
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowOhman, Miss. VelinGheorgheff, Mr. Stanio
2nd rowLaleff, Mr. KristoHarper, Miss. Annie Jessie "Nina"
3rd rowKink-Heilmann, Miss. Luise GretchenTrout, Mrs. William H (Jessie L)
4th rowNakid, Miss. Maria ("Mary")McCoy, Mr. Bernard
5th rowLaitinen, Miss. Kristina SofiaConnaghton, Mr. Michael
ValueCountFrequency (%)
mr 256
 
14.2%
miss 98
 
5.4%
mrs 64
 
3.5%
william 26
 
1.4%
master 21
 
1.2%
john 20
 
1.1%
george 17
 
0.9%
edward 15
 
0.8%
charles 13
 
0.7%
henry 12
 
0.7%
Other values (891) 1262
70.0%
ValueCountFrequency (%)
mr 258
 
14.0%
miss 84
 
4.6%
mrs 71
 
3.9%
william 28
 
1.5%
john 25
 
1.4%
master 22
 
1.2%
henry 19
 
1.0%
charles 15
 
0.8%
james 11
 
0.6%
george 11
 
0.6%
Other values (886) 1300
70.5%
2025-03-18T22:15:14.098255image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1359
 
11.4%
r 983
 
8.2%
e 848
 
7.1%
a 827
 
6.9%
s 663
 
5.5%
i 645
 
5.4%
n 616
 
5.1%
M 569
 
4.8%
l 529
 
4.4%
o 504
 
4.2%
Other values (50) 4420
36.9%
ValueCountFrequency (%)
1399
 
11.5%
r 1000
 
8.2%
e 882
 
7.2%
a 842
 
6.9%
n 659
 
5.4%
s 653
 
5.4%
i 651
 
5.3%
M 578
 
4.7%
l 552
 
4.5%
o 509
 
4.2%
Other values (50) 4446
36.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11963
100.0%
ValueCountFrequency (%)
(unknown) 12171
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1359
 
11.4%
r 983
 
8.2%
e 848
 
7.1%
a 827
 
6.9%
s 663
 
5.5%
i 645
 
5.4%
n 616
 
5.1%
M 569
 
4.8%
l 529
 
4.4%
o 504
 
4.2%
Other values (50) 4420
36.9%
ValueCountFrequency (%)
1399
 
11.5%
r 1000
 
8.2%
e 882
 
7.2%
a 842
 
6.9%
n 659
 
5.4%
s 653
 
5.4%
i 651
 
5.3%
M 578
 
4.7%
l 552
 
4.5%
o 509
 
4.2%
Other values (50) 4446
36.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11963
100.0%
ValueCountFrequency (%)
(unknown) 12171
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1359
 
11.4%
r 983
 
8.2%
e 848
 
7.1%
a 827
 
6.9%
s 663
 
5.5%
i 645
 
5.4%
n 616
 
5.1%
M 569
 
4.8%
l 529
 
4.4%
o 504
 
4.2%
Other values (50) 4420
36.9%
ValueCountFrequency (%)
1399
 
11.5%
r 1000
 
8.2%
e 882
 
7.2%
a 842
 
6.9%
n 659
 
5.4%
s 653
 
5.4%
i 651
 
5.3%
M 578
 
4.7%
l 552
 
4.5%
o 509
 
4.2%
Other values (50) 4446
36.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11963
100.0%
ValueCountFrequency (%)
(unknown) 12171
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1359
 
11.4%
r 983
 
8.2%
e 848
 
7.1%
a 827
 
6.9%
s 663
 
5.5%
i 645
 
5.4%
n 616
 
5.1%
M 569
 
4.8%
l 529
 
4.4%
o 504
 
4.2%
Other values (50) 4420
36.9%
ValueCountFrequency (%)
1399
 
11.5%
r 1000
 
8.2%
e 882
 
7.2%
a 842
 
6.9%
n 659
 
5.4%
s 653
 
5.4%
i 651
 
5.3%
M 578
 
4.7%
l 552
 
4.5%
o 509
 
4.2%
Other values (50) 4446
36.5%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
284 
female
162 
male
291 
female
155 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72645744.6950673
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21082094
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowmalefemale
3rd rowfemalefemale
4th rowfemalemale
5th rowfemalemale

Common Values

ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Length

2025-03-18T22:15:14.188457image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T22:15:14.243586image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:14.276493image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Most occurring characters

ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2108
100.0%
ValueCountFrequency (%)
(unknown) 2094
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2108
100.0%
ValueCountFrequency (%)
(unknown) 2094
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2108
100.0%
ValueCountFrequency (%)
(unknown) 2094
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7580
Distinct (%)21.1%22.4%
Missing9089
Missing (%)20.2%20.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.23384829.598515
 Dataset ADataset B
Minimum0.670.67
Maximum8071
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:14.513838image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.670.67
5-th percentile44
Q11920
median2828
Q338.2539
95-th percentile57.2557.2
Maximum8071
Range79.3370.33
Interquartile range (IQR)19.2519

Descriptive statistics

 Dataset ADataset B
Standard deviation15.0238514.648748
Coefficient of variation (CV)0.513919680.49491496
Kurtosis0.2882204-0.01277809
Mean29.23384829.598515
Median Absolute Deviation (MAD)99
Skewness0.476328670.36455786
Sum10407.2510566.67
Variance225.71607214.58582
MonotonicityNot monotonicNot monotonic
2025-03-18T22:15:14.657140image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18 16
 
3.6%
24 14
 
3.1%
22 13
 
2.9%
19 13
 
2.9%
28 12
 
2.7%
30 12
 
2.7%
32 11
 
2.5%
21 11
 
2.5%
20 11
 
2.5%
16 11
 
2.5%
Other values (65) 232
52.0%
(Missing) 90
 
20.2%
ValueCountFrequency (%)
24 17
 
3.8%
25 15
 
3.4%
19 14
 
3.1%
18 13
 
2.9%
31 12
 
2.7%
27 12
 
2.7%
22 12
 
2.7%
28 11
 
2.5%
29 11
 
2.5%
23 11
 
2.5%
Other values (70) 229
51.3%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 4
0.9%
4 6
1.3%
5 3
0.7%
6 2
 
0.4%
7 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 3
0.7%
4 5
1.1%
5 4
0.9%
6 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 3
0.7%
4 5
1.1%
5 4
0.9%
6 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 4
0.9%
4 6
1.3%
5 3
0.7%
6 2
 
0.4%
7 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.578475340.47982063
 Dataset ADataset B
Minimum00
Maximum88
Zeros300303
Zeros (%)67.3%67.9%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:14.753572image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.1962050.96141093
Coefficient of variation (CV)2.06785832.0036882
Kurtosis14.37730519.451332
Mean0.578475340.47982063
Median Absolute Deviation (MAD)00
Skewness3.38219853.6544422
Sum258214
Variance1.43090640.92431098
MonotonicityNot monotonicNot monotonic
2025-03-18T22:15:14.823707image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 300
67.3%
1 103
 
23.1%
2 13
 
2.9%
3 11
 
2.5%
4 11
 
2.5%
8 4
 
0.9%
5 4
 
0.9%
ValueCountFrequency (%)
0 303
67.9%
1 110
 
24.7%
2 15
 
3.4%
4 8
 
1.8%
3 7
 
1.6%
8 2
 
0.4%
5 1
 
0.2%
ValueCountFrequency (%)
0 300
67.3%
1 103
 
23.1%
2 13
 
2.9%
3 11
 
2.5%
4 11
 
2.5%
5 4
 
0.9%
8 4
 
0.9%
ValueCountFrequency (%)
0 303
67.9%
1 110
 
24.7%
2 15
 
3.4%
3 7
 
1.6%
4 8
 
1.8%
5 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0 303
67.9%
1 110
 
24.7%
2 15
 
3.4%
3 7
 
1.6%
4 8
 
1.8%
5 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0 300
67.3%
1 103
 
23.1%
2 13
 
2.9%
3 11
 
2.5%
4 11
 
2.5%
5 4
 
0.9%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.403587440.39910314
 Dataset ADataset B
Minimum00
Maximum56
Zeros328337
Zeros (%)73.5%75.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:14.888758image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q310
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)10

Descriptive statistics

 Dataset ADataset B
Standard deviation0.775110370.8519108
Coefficient of variation (CV)1.92055132.134563
Kurtosis6.481451211.344166
Mean0.403587440.39910314
Median Absolute Deviation (MAD)00
Skewness2.27328712.9462133
Sum180178
Variance0.600796090.725752
MonotonicityNot monotonicNot monotonic
2025-03-18T22:15:14.954419image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 328
73.5%
1 67
 
15.0%
2 45
 
10.1%
3 3
 
0.7%
5 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 60
 
13.5%
2 41
 
9.2%
5 4
 
0.9%
3 2
 
0.4%
6 1
 
0.2%
4 1
 
0.2%
ValueCountFrequency (%)
0 328
73.5%
1 67
 
15.0%
2 45
 
10.1%
3 3
 
0.7%
4 1
 
0.2%
5 2
 
0.4%
ValueCountFrequency (%)
0 337
75.6%
1 60
 
13.5%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
5 4
 
0.9%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 60
 
13.5%
2 41
 
9.2%
3 2
 
0.4%
4 1
 
0.2%
5 4
 
0.9%
6 1
 
0.2%
ValueCountFrequency (%)
0 328
73.5%
1 67
 
15.0%
2 45
 
10.1%
3 3
 
0.7%
4 1
 
0.2%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct373374
Distinct (%)83.6%83.9%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:15.362464image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.68834086.7869955
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29833027
Distinct characters3435
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique320313 ?
Unique (%)71.7%70.2%

Sample

 Dataset ADataset B
1st row347085349254
2nd row349217248727
3rd row315153240929
4th row2653367226
5th row4135335097
ValueCountFrequency (%)
pc 30
 
5.3%
c.a 11
 
2.0%
ca 8
 
1.4%
2 6
 
1.1%
ston/o 6
 
1.1%
a/5 6
 
1.1%
1601 5
 
0.9%
soton/o.q 5
 
0.9%
w./c 4
 
0.7%
f.c.c 4
 
0.7%
Other values (391) 476
84.8%
ValueCountFrequency (%)
pc 28
 
4.9%
c.a 11
 
1.9%
a/5 10
 
1.8%
2 8
 
1.4%
ston/o 8
 
1.4%
sc/paris 6
 
1.1%
w./c 5
 
0.9%
soton/o.q 5
 
0.9%
a/4 4
 
0.7%
ca 4
 
0.7%
Other values (395) 479
84.3%
2025-03-18T22:15:15.880356image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 373
12.5%
1 333
11.2%
2 286
9.6%
7 271
9.1%
4 243
8.1%
6 210
 
7.0%
0 204
 
6.8%
5 197
 
6.6%
9 153
 
5.1%
8 131
 
4.4%
Other values (24) 582
19.5%
ValueCountFrequency (%)
3 363
12.0%
1 356
11.8%
2 294
9.7%
7 242
 
8.0%
4 235
 
7.8%
6 220
 
7.3%
0 201
 
6.6%
5 183
 
6.0%
9 161
 
5.3%
8 146
 
4.8%
Other values (25) 626
20.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2983
100.0%
ValueCountFrequency (%)
(unknown) 3027
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 373
12.5%
1 333
11.2%
2 286
9.6%
7 271
9.1%
4 243
8.1%
6 210
 
7.0%
0 204
 
6.8%
5 197
 
6.6%
9 153
 
5.1%
8 131
 
4.4%
Other values (24) 582
19.5%
ValueCountFrequency (%)
3 363
12.0%
1 356
11.8%
2 294
9.7%
7 242
 
8.0%
4 235
 
7.8%
6 220
 
7.3%
0 201
 
6.6%
5 183
 
6.0%
9 161
 
5.3%
8 146
 
4.8%
Other values (25) 626
20.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2983
100.0%
ValueCountFrequency (%)
(unknown) 3027
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 373
12.5%
1 333
11.2%
2 286
9.6%
7 271
9.1%
4 243
8.1%
6 210
 
7.0%
0 204
 
6.8%
5 197
 
6.6%
9 153
 
5.1%
8 131
 
4.4%
Other values (24) 582
19.5%
ValueCountFrequency (%)
3 363
12.0%
1 356
11.8%
2 294
9.7%
7 242
 
8.0%
4 235
 
7.8%
6 220
 
7.3%
0 201
 
6.6%
5 183
 
6.0%
9 161
 
5.3%
8 146
 
4.8%
Other values (25) 626
20.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2983
100.0%
ValueCountFrequency (%)
(unknown) 3027
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 373
12.5%
1 333
11.2%
2 286
9.6%
7 271
9.1%
4 243
8.1%
6 210
 
7.0%
0 204
 
6.8%
5 197
 
6.6%
9 153
 
5.1%
8 131
 
4.4%
Other values (24) 582
19.5%
ValueCountFrequency (%)
3 363
12.0%
1 356
11.8%
2 294
9.7%
7 242
 
8.0%
4 235
 
7.8%
6 220
 
7.3%
0 201
 
6.6%
5 183
 
6.0%
9 161
 
5.3%
8 146
 
4.8%
Other values (25) 626
20.7%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct186184
Distinct (%)41.7%41.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.815229.989172
 Dataset ADataset B
Minimum00
Maximum512.3292263
Zeros57
Zeros (%)1.1%1.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:16.003166image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.22927.129175
Q18.057.925
median15.245815.2458
Q332.08747531.275
95-th percentile110.883390.8094
Maximum512.3292263
Range512.3292263
Interquartile range (IQR)24.03747523.35

Descriptive statistics

 Dataset ADataset B
Standard deviation52.55219937.816109
Coefficient of variation (CV)1.55409991.2609921
Kurtosis33.44238414.164912
Mean33.815229.989172
Median Absolute Deviation (MAD)7.50627.9958
Skewness4.83547433.3008988
Sum15081.57913375.171
Variance2761.73361430.0581
MonotonicityNot monotonicNot monotonic
2025-03-18T22:15:16.143456image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 24
 
5.4%
7.8958 18
 
4.0%
13 18
 
4.0%
26 17
 
3.8%
7.75 16
 
3.6%
10.5 11
 
2.5%
7.925 10
 
2.2%
7.775 9
 
2.0%
7.8542 8
 
1.8%
8.6625 8
 
1.8%
Other values (176) 307
68.8%
ValueCountFrequency (%)
8.05 22
 
4.9%
13 22
 
4.9%
7.75 20
 
4.5%
26 18
 
4.0%
7.8958 12
 
2.7%
10.5 12
 
2.7%
7.925 8
 
1.8%
7.225 7
 
1.6%
0 7
 
1.6%
7.775 7
 
1.6%
Other values (174) 311
69.7%
ValueCountFrequency (%)
0 5
1.1%
4.0125 1
 
0.2%
5 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
ValueCountFrequency (%)
0 7
1.6%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 4
0.9%
7.125 3
0.7%
ValueCountFrequency (%)
0 7
1.6%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 4
0.9%
7.125 3
0.7%
ValueCountFrequency (%)
0 5
1.1%
4.0125 1
 
0.2%
5 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct9093
Distinct (%)90.0%84.5%
Missing346336
Missing (%)77.6%75.3%
Memory size7.0 KiB7.0 KiB
2025-03-18T22:15:16.518298image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.493.5181818
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters349387
Distinct characters1819
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique8177 ?
Unique (%)81.0%70.0%

Sample

 Dataset ADataset B
1st rowE33A26
2nd rowB50C46
3rd rowD19E44
4th rowB101D36
5th rowB20C82
ValueCountFrequency (%)
d 3
 
2.6%
e33 2
 
1.8%
c52 2
 
1.8%
g6 2
 
1.8%
c78 2
 
1.8%
b77 2
 
1.8%
b58 2
 
1.8%
b60 2
 
1.8%
b51 2
 
1.8%
b53 2
 
1.8%
Other values (91) 93
81.6%
ValueCountFrequency (%)
g6 3
 
2.4%
f 3
 
2.4%
b77 2
 
1.6%
c123 2
 
1.6%
c22 2
 
1.6%
c26 2
 
1.6%
c2 2
 
1.6%
b20 2
 
1.6%
c92 2
 
1.6%
b96 2
 
1.6%
Other values (94) 105
82.7%
2025-03-18T22:15:16.965929image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 35
10.0%
C 35
10.0%
B 34
9.7%
2 33
 
9.5%
5 27
 
7.7%
3 25
 
7.2%
6 22
 
6.3%
0 19
 
5.4%
8 18
 
5.2%
D 18
 
5.2%
Other values (8) 83
23.8%
ValueCountFrequency (%)
2 42
 
10.9%
C 37
 
9.6%
1 34
 
8.8%
B 29
 
7.5%
6 29
 
7.5%
4 26
 
6.7%
3 25
 
6.5%
E 21
 
5.4%
D 20
 
5.2%
7 19
 
4.9%
Other values (9) 105
27.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 349
100.0%
ValueCountFrequency (%)
(unknown) 387
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 35
10.0%
C 35
10.0%
B 34
9.7%
2 33
 
9.5%
5 27
 
7.7%
3 25
 
7.2%
6 22
 
6.3%
0 19
 
5.4%
8 18
 
5.2%
D 18
 
5.2%
Other values (8) 83
23.8%
ValueCountFrequency (%)
2 42
 
10.9%
C 37
 
9.6%
1 34
 
8.8%
B 29
 
7.5%
6 29
 
7.5%
4 26
 
6.7%
3 25
 
6.5%
E 21
 
5.4%
D 20
 
5.2%
7 19
 
4.9%
Other values (9) 105
27.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 349
100.0%
ValueCountFrequency (%)
(unknown) 387
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 35
10.0%
C 35
10.0%
B 34
9.7%
2 33
 
9.5%
5 27
 
7.7%
3 25
 
7.2%
6 22
 
6.3%
0 19
 
5.4%
8 18
 
5.2%
D 18
 
5.2%
Other values (8) 83
23.8%
ValueCountFrequency (%)
2 42
 
10.9%
C 37
 
9.6%
1 34
 
8.8%
B 29
 
7.5%
6 29
 
7.5%
4 26
 
6.7%
3 25
 
6.5%
E 21
 
5.4%
D 20
 
5.2%
7 19
 
4.9%
Other values (9) 105
27.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 349
100.0%
ValueCountFrequency (%)
(unknown) 387
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 35
10.0%
C 35
10.0%
B 34
9.7%
2 33
 
9.5%
5 27
 
7.7%
3 25
 
7.2%
6 22
 
6.3%
0 19
 
5.4%
8 18
 
5.2%
D 18
 
5.2%
Other values (8) 83
23.8%
ValueCountFrequency (%)
2 42
 
10.9%
C 37
 
9.6%
1 34
 
8.8%
B 29
 
7.5%
6 29
 
7.5%
4 26
 
6.7%
3 25
 
6.5%
E 21
 
5.4%
D 20
 
5.2%
7 19
 
4.9%
Other values (9) 105
27.1%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
318 
C
85 
Q
42 
S
318 
C
90 
Q
37 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSC
2nd rowSS
3rd rowSS
4th rowCQ
5th rowSQ

Common Values

ValueCountFrequency (%)
S 318
71.3%
C 85
 
19.1%
Q 42
 
9.4%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 318
71.3%
C 90
 
20.2%
Q 37
 
8.3%
(Missing) 1
 
0.2%

Length

2025-03-18T22:15:17.045900image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-18T22:15:17.094796image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:17.135362image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 318
71.5%
c 85
 
19.1%
q 42
 
9.4%
ValueCountFrequency (%)
s 318
71.5%
c 90
 
20.2%
q 37
 
8.3%

Most occurring characters

ValueCountFrequency (%)
S 318
71.5%
C 85
 
19.1%
Q 42
 
9.4%
ValueCountFrequency (%)
S 318
71.5%
C 90
 
20.2%
Q 37
 
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 318
71.5%
C 85
 
19.1%
Q 42
 
9.4%
ValueCountFrequency (%)
S 318
71.5%
C 90
 
20.2%
Q 37
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 318
71.5%
C 85
 
19.1%
Q 42
 
9.4%
ValueCountFrequency (%)
S 318
71.5%
C 90
 
20.2%
Q 37
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 318
71.5%
C 85
 
19.1%
Q 42
 
9.4%
ValueCountFrequency (%)
S 318
71.5%
C 90
 
20.2%
Q 37
 
8.3%

Interactions

Dataset A

2025-03-18T22:15:09.987491image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:12.184736image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:08.591569image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:10.776906image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:08.904858image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.079811image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.254170image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.395505image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.580712image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.729525image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:10.045219image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:12.245489image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:08.651624image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:10.833572image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:08.981148image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.140416image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.318384image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.461977image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.744117image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.790150image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:10.110119image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:12.306944image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:08.716932image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:10.894339image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.060593image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.207168image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.380503image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.525128image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.808032image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.856380image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:10.174319image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:12.372517image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:08.782322image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:10.958734image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.124241image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.267650image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.450263image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.596675image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.870266image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.922666image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:10.235828image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:12.432565image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:08.842425image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.019429image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.190262image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.333410image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.513529image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:11.661495image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-18T22:15:09.927832image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:12.123942image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-03-18T22:15:17.185297image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-18T22:15:17.287792image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1140.082-0.3080.0460.2050.108-0.1810.142
Embarked0.1141.0000.2240.0270.0000.2800.0660.0980.161
Fare0.0820.2241.0000.380-0.0530.4940.1600.4170.256
Parch-0.3080.0270.3801.000-0.0150.0000.2630.4290.111
PassengerId0.0460.000-0.053-0.0151.0000.0000.083-0.0440.178
Pclass0.2050.2800.4940.0000.0001.0000.0870.1430.318
Sex0.1080.0660.1600.2630.0830.0871.0000.1690.533
SibSp-0.1810.0980.4170.429-0.0440.1430.1691.0000.130
Survived0.1420.1610.2560.1110.1780.3180.5330.1301.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.092-0.2780.0760.2350.055-0.1750.169
Embarked0.0001.0000.2560.0000.0230.2880.1070.0930.186
Fare0.0920.2561.0000.4010.0480.5640.2100.4810.297
Parch-0.2780.0000.4011.0000.0560.0720.2560.4300.136
PassengerId0.0760.0230.0480.0561.0000.0000.127-0.0090.107
Pclass0.2350.2880.5640.0720.0001.0000.1600.1640.354
Sex0.0550.1070.2100.2560.1270.1601.0000.1940.534
SibSp-0.1750.0930.4810.430-0.0090.1640.1941.0000.218
Survived0.1690.1860.2970.1360.1070.3540.5340.2181.000

Missing values

Dataset A

2025-03-18T22:15:10.332046image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-18T22:15:12.528172image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-18T22:15:10.414404image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-18T22:15:12.608409image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-18T22:15:10.503223image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-18T22:15:12.691350image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
55455513Ohman, Miss. Velinfemale22.0003470857.7750NaNS
87887903Laleff, Mr. KristomaleNaN003492177.8958NaNS
18418513Kink-Heilmann, Miss. Luise Gretchenfemale4.00231515322.0250NaNS
38138213Nakid, Miss. Maria ("Mary")female1.002265315.7417NaNC
50350403Laitinen, Miss. Kristina Sofiafemale37.00041359.5875NaNS
33833913Dahl, Mr. Karl Edwartmale45.00075988.0500NaNS
61561612Herman, Miss. Alicefemale24.01222084565.0000NaNS
697003Kink, Mr. Vincenzmale26.0203151518.6625NaNS
636403Skoog, Master. Haraldmale4.03234708827.9000NaNS
77877903Kilgannon, Mr. Thomas JmaleNaN00368657.7375NaNQ

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
42042103Gheorgheff, Mr. StaniomaleNaN003492547.8958NaNC
72072112Harper, Miss. Annie Jessie "Nina"female6.00124872733.0000NaNS
39940012Trout, Mrs. William H (Jessie L)female28.00024092912.6500NaNS
30130213McCoy, Mr. BernardmaleNaN2036722623.2500NaNQ
74975003Connaghton, Mr. Michaelmale31.0003350977.7500NaNQ
646501Stewart, Mr. Albert AmaleNaN00PC 1760527.7208NaNC
70971013Moubarek, Master. Halim Gonios ("William George")maleNaN11266115.2458NaNC
919203Andreasson, Mr. Paul Edvinmale20.0003474667.8542NaNS
64764811Simonius-Blumer, Col. Oberst Alfonsmale56.0001321335.5000A26C
76977003Gronnestad, Mr. Daniel Danielsenmale32.00084718.3625NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
64364413Foo, Mr. ChoongmaleNaN00160156.4958NaNS
62662702Kirkland, Rev. Charles Leonardmale57.00021953312.3500NaNQ
35535603Vanden Steen, Mr. Leo Petermale28.0003457839.5000NaNS
30931011Francatelli, Miss. Laura Mabelfemale30.000PC 1748556.9292E36C
68168211Hassab, Mr. Hammadmale27.000PC 1757276.7292D49C
54254303Andersson, Miss. Sigrid Elisabethfemale11.04234708231.2750NaNS
28328413Dorking, Mr. Edward Arthurmale19.000A/5. 104828.0500NaNS
545501Ostby, Mr. Engelhart Corneliusmale65.00111350961.9792B30C
52953002Hocking, Mr. Richard Georgemale23.0212910411.5000NaNS
59559603Van Impe, Mr. Jean Baptistemale36.01134577324.1500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
14614713Andersson, Mr. August Edvard ("Wennerstrom")male27.0003500437.7958NaNS
73473502Troupiansky, Mr. Moses Aaronmale23.00023363913.0000NaNS
79779813Osman, Mrs. Marafemale31.0003492448.6833NaNS
24925002Carter, Rev. Ernest Courtenaymale54.01024425226.0000NaNS
18718811Romaine, Mr. Charles Hallace ("Mr C Rolmane")male45.00011142826.5500NaNS
65966001Newell, Mr. Arthur Webstermale58.00235273113.2750D48C
37037111Harder, Mr. George Achillesmale25.0101176555.4417E50C
32132203Danoff, Mr. Yotomale27.0003492197.8958NaNS
72372402Hodges, Mr. Henry Pricemale50.00025064313.0000NaNS
24824911Beckwith, Mr. Richard Leonardmale37.0111175152.5542D35S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.